ADF Audit Trail Developer's Guide v1.5.3 RF

The Allotrope Data Format (ADF) [[!ADF]] consists of several APIs and taxonomies. This document provides a Developer's Guide to the Allotrope Audit Trail API (ADF-AUDIT) [[!ADF-AUDIT]] for storing raw analytical data. It introduces the ADF-AUDIT API and illustrates it by code examples. This API uses classes and properties that are based on their definition in the ADF Audit Trail Ontology [[!ADF-AUDIT-Ontology]] and others like W3C Provenance Ontology [[!PROV-O]] and the Ontology of Provenance and Versioning [[!PAV]].

Disclaimer

THESE MATERIALS ARE PROVIDED "AS IS" AND ALLOTROPE EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE WARRANTIES OF NON-INFRINGEMENT, TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This document is part of a set of specifications on the Allotrope Data Format [[!ADF]]


Introduction

The Allotrope Data Format (ADF) defines an interface for storing scientific observations from analytical chemistry. It is intended for long-term stability of archived analytical data and fast real-time access to it. The ADF Data Cube API (ADF-DC) [[!ADF-DC]] defines an interface for storing raw analytical data. ADF-AUDIT-API uses classes derived from the

This document provides a Developer's Guide for the ADF-AUDIT API.

The document is structured as follows: First, activating the audit trail on an ADF file is shown. The ADF file is now under under audit trail and changes on it are now tracked. Second, it is shown how these changes can be read, together with the information in the audit record of who made the change, when the was change done, and why it has been done. Then, a way to provide an audit trail from an external source is shown.

Document Conventions

Namespaces

Within this document, the following namespace prefix bindings are used:

Prefix Namespace
owl:http://www.w3.org/2002/07/owl#
rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs:http://www.w3.org/2000/01/rdf-schema#
xsd:http://www.w3.org/2001/XMLSchema#
skos:http://www.w3.org/2004/02/skos/core#
dct:http://purl.org/dc/terms/
adf-audit:http://purl.allotrope.org/ontologies/audit#
adf-dc:http://purl.allotrope.org/ontologies/datacube#
adf-dp:http://purl.allotrope.org/ontologies/datapackage#
foaf:http://xmlns.com/foaf/0.1/
org:http://www.w3.org/ns/org#>
prov:http://www.w3.org/ns/prov#
pav:http://purl.org/pav/
ex:http://example.com/ns#

Number Formatting

Within this document, decimal numbers will use a dot "." as the decimal mark.

Working with an active audit trail on ADF

This section introduces the core operations of the ADF-AUDIT-API and illustrates these by examples. The core operations of the ADF-DC API are activating audit trail, starting a new audit record, and committing the changes, as well as reading the audit trail later on for its meta data and the actual changes done.

Activating audit trail and accessing the AdfAuditTrailService

The main entry point to the Audit Trail API is the interface AdfAuditTrailService.

Given an ADF file adfFile of type AdfFile, an instance of this service may be retrieved as follows:

Java and C#:

    DataCubeService dataCubeService = adfFile.getAuditTrailService();
	

The audit trail feature is activated on an ADF file by calling

Java and C#:

    adfFile.activateAuditTrail();
	

Creating an audit record and tracking the changes

After activating the audit trail, any changes that are done to the ADF file will be tracked in an active audit record that is appended to the audit trail of the file. Any changes done to the ADF file without having the audit record open will throw an exception. An audit record can be opened by calling one of the following method on the AdfAuditTrailService, depending on the kind of change activity on the ADF file:

All these methods expect an agent, a text description of the motivation, and a reference to the software with which the change is applied. The choice of the method sets the role the agent plays in the revision of the ADF file.

All these methods return a ChangeCapture object that acts as a handle to the newly created audit record in the audit trail of the ADF file. Changes to the ADF file are now recorded until commit() is called on the change capture object. This closes and returns the audit record, and the ADF file becomes again read-only until a new audit record is created.

Java and C#:

    Agent operator = ...;
    String motivation = "need to change something";
    Entity software = ...;

    ChangeCapture changeCapture = auditTrailService.startOperation(operator, motivation, software);
	// change something in the ADF file
    AuditRecord lastRecord = changeCapture.commit();
	

AuditRecord, Agent, and Entity are shape classes that are Java/C# representations of the OWL classes defined in the ADF-AUDIT and PROV ontologies. The audit record is an RDF graph in a separate dataset of the internal ADF quad store. Using these classes simplifies writing and reading the audit (meta) data from the RDF triples in the record. How to create shape classes and how to read and write them to RDF graphs is explained in a later section.

Note that the operation of closing the audit record is called commit(). This does not imply that the tracking of changes is any way transactional. Any change is actually committed immediately, and there is no way to roll back a change with the audit trail API. If the audit record is not committed using the ChangeCapture object before the ADF file is closed - which might be intended in a long running change - the record remains open. If an application now wants to make a different change that should be tracked in a new record, the API allows to close the active audit record without having the ChangeCapture object by using the method forceCommit() of the AdfAuditTrailService. The application must state the reason for doing so, and who did the forced commit.

Reading an audit record

Now that the audit record has been created and closed, we want to read the metadata on why, when, who, and what has been changed and retrieve the original data in the ADF file that has been replaced by new data. For the data description, this can mean that statements (triples) have been added or deleted, or named graphs have been added or deleted. For the data cube, this means that data has been appended to the data cube, data has been updated in the data cube, or whole new data cubes have been added or deleted. For the data package, this means that files or folders have been created or deleted, or data has been appended to a file. Note that overwriting existing data in the file in a random-access way is not supported by the data package API.

An audit record can be retrieved by iterating over the audit records that are attached to the audit trail.

Java:

    AuditTrail auditTrail = auditTrailService.getAuditTrail();
    for (AuditRecord auditRecord : auditTrail.auditRecords()) {
        System.out.println("IRI of audit record:  " + auditRecord.id());
        System.out.println("date of audit record: " + auditRecord.created());
    }

C#:

    AuditTrail auditTrail = auditTrailService.getAuditTrail();
    foreach (AuditRecord auditRecord in auditTrail.auditRecords()) {
        System.Console.WriteLine("IRI of audit record:  " + auditRecord.id());
        System.Console.WriteLine("date of audit record: " + auditRecord.created());
    }
	

The audit records returned for the audit trail in this way do not contain the content of the audit record in the audit record dataset. The content must be retrieved separately as described later, or the RDF dataset must be exported using auditTrailService.exportAuditRecordDataset() for external analysis with SPARQL.

The last audit record can be directly read using auditTrailService.lastAuditRecord(); the active one can be obtained from the ChangeCapture handle.

Reading an audit record's metadata

The metadata of an audit record contains three main parts coming from the [[!PROV-O]] ontology:

  • the revision by auditTrailService.getAuditRecordRevision(), that describes what has changed
  • the revision activity by auditTrailService.getAuditRecordActivity(), that describes when and how the change has been done
  • the revision attributions by auditTrailService.getAuditRecordAttributions(), that describe agents who attributed to the change.

Java:

    Revision revision = auditTrailService.getAuditRecordRevision();
    VersionedEntity revised =  (VersionedEntity) revision.revised();
    System.out.println("version of the revised entity: " + revised.version());

    Activity activity = auditTrailService.getAuditRecordActivity();
    System.out.println("revision started at: " + activity.startedAtTime());
    System.out.println("revision ended at:   " + activity.endedAtTime());

    for (Attribution attribution = auditTrailService.getAuditRecordAttributions()) {
        Agent agent = attribution.agent();
        System.out.println("revision done by/with: " + attribution.agent.id());
    }


C#:

    Revision revision = auditTrailService.getAuditRecordRevision();
    VersionedEntity revised =  (VersionedEntity) revision.revised();
    System.Console.WriteLine("version of the revised entity: " + revised.version());

    Activity activity = auditTrailService.getAuditRecordActivity();
    System.Console.WriteLine("revision started at: " + activity.startedAtTime());
    System.Console.WriteLine("revision ended at:   " + activity.endedAtTime());

    foreach (Attribution attribution in auditTrailService.getAuditRecordAttributions()) {
        Agent agent = attribution.agent();
        System.Console.WriteLine("revision done by/with: " + attribution.agent.id());
    }

Reading the audit record changes

The changes in the three parts of the ADF file can be read as follows:

All these methods return a ChangeSet containing additions(), removals(), and dataUpdates(). What is added, removed, resp. updated depends on what part of the ADF file was changed.

Data Cube Changes

For data cube changes, the additions/removals are data cubes; data updates are updates of the data cube measure values.

Java:

	ChangeSet dcChanges = auditTrailService.getAuditRecordDataCubeChanges(auditRecordIri);
	for (Resource addedResource : dcChanges.additions()) {
		DataSet addedCube = (DataSet) addedResource;
		System.out.println("data cube added: " + addedCube.id());
	}
	for (Resource removedCube : dcChanges.removals()) {
		DataSet removedCube = (DataSet) removedResource;
		System.out.println("data cube removed: " + removedCube.id());
	}
	for (DataUpdate update : dcChanges.updates()) {
		Resource updatedCube = update.target();
		System.out.println("data cube updated: " + target.id());
		// selection of what was removed in the cube
		DataSelection oldData = (DataSelection) update.oldDataReference();
		// selection of what was added in the cube
		DataSelection newData = (DataSelection) update.newDataReference();
	}

C#:

	ChangeSet dcChanges = auditTrailService.getAuditRecordDataCubeChanges(auditRecordIri);
	foreach (Resource addedResource in dcChanges.additions()) {
		DataSet addedCube = (DataSet) addedResource;
		System.Console.WriteLine("data cube added: " + addedCube.id());
	}
	foreach (Resource removedResource in dcChanges.removals()) {
		DataSet removedCube = (DataSet) removedResource;
		System.Console.WriteLine("data cube removed: " + removedCube.id());
	}
	foreach (DataUpdate update in dcChanges.updates()) {
		Resource updatedCube = update.target();
		System.Console.WriteLine("data cube updated: " + target.id());
		// selection of what was removed in the cube
		DataSelection oldData = (DataSelection) update.oldDataReference();
		// selection of what was added in the cube
		DataSelection newData = (DataSelection) update.newDataReference();
	}
	

The data selections returned in a data update can be used to retrieve the old data using auditTrailService.getArchivedDatacubeData(). The returned data is an array of the size of the data selection. If a whole data cube has been deleted, the data selection is the full selection of the deleted data cube. Any deleted/overwritten data of a cube is archived in a one-dimensionsal internal archive dataset of the cube.

Data Package Changes

For data package changesets, the additions and removals are data package files or folders. A data update can only be done on a file, and currently this is always an append, so there is never existing data overwritten that needs to be archived.

Java:

    ChangeSet dpChanges = auditTrailService.getAuditRecordDataPackageChanges(auditRecordIri);
    for (Resource addedResource : dpChanges.additions()) {
        if (addedResource instanceof File) {
            File addedFile = (File) addedResource;
            System.out.println("file added: " + addedFile.id());
        } else if (addedResource instanceof Folder) {
            Folder addedFolder = (Folder) addedResource;
            System.out.println("folder added: " + addedFolder.id());
        }
    }
    for (Resource removedResource : dpChanges.removals()) {
        if (removedResource instanceof File) {
            File removedFile = (File) removedResource;
            System.out.println("file removed: " + removedFile.id());
        } else if (removedResource instanceof Folder) {
            Folder removedFolder = (Folder) removedResource;
            System.out.println("folder removed: " + removedFolder.id());
        }
    }
    for (DataUpdate update : dpChanges.updates()) {
        Resource updatedFile = update.target();
        System.out.println("file updated: " + target.id()); // always a file, never a folder
        Segment newData = (Segment) update.newDataReference();
    }

C#:

    ChangeSet dpChanges = auditTrailService.getAuditRecordDataPackageChanges(auditRecordIri);
    foreach (Resource addedResource in dpChanges.additions()) {
        if (addedResource is File) {
            File addedFile = (File) addedResource;
            System.Console.WriteLine("file added: " + addedFile.id());
        } else if (addedResource is Folder) {
            Folder addedFolder = (Folder) addedResource;
            System.Console.WriteLine("folder added: " + addedFolder.id());
        }
    }
    foreach (Resource removedResource in dpChanges.removals()) {
        if (removedResource is File) {
            File removedFile = (File) removedResource;
            System.Console.WriteLine("file removed: " + removedFile.id());
        } else if (removedResource is Folder) {
            Folder removedFolder = (Folder) removedResource;
            System.Console.WriteLine("folder removed: " + removedFolder.id());
        }
    }
    foreach (DataUpdate update in dpChanges.updates()) {
        Resource updatedFile = update.target();
        System.Console.WriteLine("file updated: " + target.id()); // always a file, never a folder
        Segment newData = (Segment) update.newDataReference();
    }
	

The deleted file can be retrieved from the archive using auditTrailService.openArchivedFile().

Data Description Changes

For data description changesets, the additions and removals are named graphs. A data update on the data description is the addition and removal of statements (triples) to/from a named graph, usually the default graph. All added/removed statements are collected in a separate named graph. An update of a single statement is always the combination of the removal of the old statement and the addition of the new statement.

Assuming that the change had been the addition of a statement to the data description default model:

Java and C#:

    ChangeCapture createStatements = auditTrailService.startOperation(agent, "reason for creating statements", someSoftware);
    // we add a single statement
    Model ddDefaultGraph = adfFile.getDataDescription();
    ddDefaultGraph.add(ResourceFactory.createResource("http://example.org/res/someThing"),
                       ResourceFactory.createProperty("http://example.org/prop/someProperty"),
                       ResourceFactory.createPlainLiteral("a value"));
    auditRecord = createStatements.commit();
	

This change is a DataUpdate, and the added statement is part of a named graph in the dataset of the audit record, returned as newData(). The RDF dataset of the audit record is retrieved with auditTrailService.getAuditRecordDataset(). The graph that the statement had been added to - in this case, it is the default graph -, is the target() of the data update:

Java:

    Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
    ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
    for (DataUpdate update : ddChanges.updates()) {
        Resource targetModel = update.target();
        Resource newStatements = update.newData();
        Model addedStatementModel = auditRecordDataset.getNamedModel(newStatements.id().get());
        Statement stmt = addedStatementModel.listStatements().next();
    }

C#:

    Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
    ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
    foreach (DataUpdate update in ddChanges.updates()) {
        Resource targetModel = update.target();
        Resource newStatements = update.newData();
        Model addedStatementModel = auditRecordDataset.getNamedModel(newStatements.id().get());
        Statement stmt = addedStatementModel.listStatements().next();
    }
	

Now assuming that the changes had been the addition of a named graph to the data description dataset:

Java and C#:

    ChangeCapture createNamedGraph = auditTrailService.startOperation(agent, "reason for creating a named graph", someSoftware);
    // we create a simple named graph with a single statement
    Model namedGraph = ModelFactory.createMemModelMaker().createFreshModel();
    namedGraph.add(ResourceFactory.createResource("http://example.org/res/someThing"),
                   ResourceFactory.createProperty("http://example.org/prop/someProperty"),
                   ResourceFactory.createPlainLiteral("a value"));
    // and add it to the data description dataset
    Dataset ddSet = adfFile.getDataset();
    ddSet.addNamedModel("http://example.org/example/graph", namedGraph);
    auditRecord = createNamedGraph.commit();
	

In this case, the changeset contains a single addition:

Java:

    Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
    ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
    for (Resource addition : ddChanges.additions()) {
        Model addedModel = auditRecordDataset.getNamedModel(addition.id().get());
        Statement stmt = addedModel.listStatements().next();
    }

C#:

    Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
    ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
    foreach (Resource addition in ddChanges.additions()) {
        Model addedModel = auditRecordDataset.getNamedModel(addition.id().get());
        Statement stmt = addedModel.listStatements().next();
    }
	

If a named graph had been removed in an audit record, the graph is stored under a different name. To retrieve the graph, the method getArchivedNamedModel can be used. The following example shows how to retrieve the model if a graph had been removed:

  Java:

      Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
      ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
      for (Resource removal : ddChanges.removals()) {
          String removedModelName = removal.id().get();
          Model removedModel = auditTrailService.getArchivedNamedModel(auditRecordIri, removedModelName);
      }

  C#:

      Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
      ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
      foreach (Resource removal in ddChanges.removals()) {
          string removedModelName = removal.id().get();
          Model removedModel = auditTrailService.getArchivedNamedModel(auditRecordIri, removedModelName);
      }
  	

Shape Classes

The audit trail api makes use of Java/C# classes that represent defined patterns of an RDF graph and work that way like a W3C shape. There are several libraries for these classes for different vocabularies made available in the data shapes projects:

These shapes based on these ontologies cover everything that is technically needed to describe the audit trails for ADF files. Each shape class is a read-only value object, so after initial creation it can no longer be modified. The creation of the object is done with Builder classes that supports a fluent type initialization. For convenience, each package contains a Factory class to create a new builder for a shape class. Here is an example on how to create a model of a person called Tom who knows another person called Jerry, using the FoaF shape classes:

Java and C#:
   Person jerry = FoafModel.person("mailto:jerry.mouse@example.org")
                    .firstName("Jerry")
                    .lastName("Mouse")
                    .buildAll();
   Person tom = FoafModel.person("mailto:tom.cat@example.org")
                    .firstName("Tom")
                    .lastName("Cat")
                    .knows(jerry)
					.buildAll();
	

The two shape objects are related via the knows() property. firstName() and lastName() represent some basic literal properties. In an RDF representation, this would be:

    <mailto:jerry.mouse@example.org>
        a               foaf:Agent , foaf:Person ;
        foaf:firstName  "Jerry" ;
        foaf:lastName   "Mouse" .

    <mailto:tom.cat@example.org>
        a               foaf:Agent , foaf:Person ;
        foaf:firstName  "Tom" ;
        foaf:knows      <mailto:jerry.mouse@example.org> ;
        foaf:lastName   "Cat" .
	

The shape classes are often results and parameters in the Audit Trail API, but their main function is that they can be read from and written to an RDF model. The audit record is very complex, and writing/ reading it solely with the RDF APIs would be very error prone. The shape classes help here. The mapping between the shape classes in Java/C# and the RDF classes and properties is done with annotations on the Java/C# class which maps to one or more RDF classes, and with annotations on the property methods which are mapped to RDF properties.

The writing to RDF contains only non-empty, annotated properties and all the RDF types associated with the class. When reading from an RDF model, again all rdf:type predicates with the subject are read and then, depending on the target class, all annotated properties on the class are read as well. The class to perform the read/write to the RDF model is the class Object2RDF. Writing to an RDF graph is straightforward, but reading is more complicated. A method might declare as a parameter a generic super class, and only at runtime, the correct subclass can be known. An application usually knows what concrete class to expect, but the generic adapter class Object2RDF does not, so it only reads along those predicates of the RDF model that are actually declared in the class. In order to handle this problem, a ReadCallback on the generic super class can be registered. The callback will be called before and after the build with the Builder. If the application knows (or can determine by querying the RDF model) what subclass it should instantiate instead, it can perform a read using the subclass as target class on the current node to process. Shape classes implement multi-inheritence (which occurs in RDF models) by implementing mixins. Thus, the instance of the super class and the just read instance of the subclass can be mixed to a new mixin object that merges both, so that a downcast to the subclass on the returned object will not fail. The read method of Object2RDF also lets one define the scope on how deep the RDF graph is transitively traversed.

Complete Example

The ADF First Steps "General" Example Application illustrates the Java API by one complete code example. It is contained in the file FirstSteps.java in the package org.allotrope.adf.firststeps. In analogy, there is a C# example in the file Examples\net-allotrope-adf-firststeps.sln.

Change History

Version Release Date Remarks
1.0.0 2017-06-30
  • Initial version
1.4.0 2017-10-31
  • Extended section 2.3.2.3 with new function
  • Updated versions and dates
1.4.3 RC 2018-10-11
  • Updated versions and dates
1.4.5 RF 2018-12-17
  • Updated versions and dates
1.5.0 RC 2019-12-12
  • Updated versions and dates
1.5.0 RF 2020-03-03
  • Updated HDF5 reference link
1.5.3 RF 2020-11-30
  • Updated broken reference links
  • Updated PURL and DOCS server links to relative links
  • Reformat the document header